Building script driven Telephony engine
IVR requirements have become so complex that to solve them sometimes you need to write customized scripts.
Ever since we started to offer IVR for calling card and callback services back in 2004 or so, it became a never ending process of adding new features, changing behavior, creating one or another additional optional feature and sometimes rewriting all from the scratch. The requirements of the users varied significantly, we have to create number of configuration files, then later configuration screens so our clients could set their system in the way they preferred. And it all, of course was reasonable: sometimes you want system to announce your balance, and sometimes dont, sometimes- remaining time (hours + minutes, or only minutes), then somebody wants to autosave their caller ID so the user does not need to enter PIN when they call next time. Again, you need to provide option to erase the caller ID from DTMF in case they used public phone. Then there were requests to provide shortkeys for speed dialing, balance retrieval, recharge, language change, redial and so on. List would go forever if not the decline of calling card market as such.
However, the story did not stop there. we got requests to create specialized IVRs which were not only related to calling cards. There were IVRs for inbound call centers, customer support, various automated answering systems and so on. While we didn't have to build each case from scratch, it still required building a new module, which at the end was relatively slow process.
Later came idea to build a universal, GUI base system which would allow non technically minded people to put together they behavior they needed by clicking, dragging and dropping on the screen. While that theoretically works, there are couple of drawbacks to that approach:
First, it is still quite hard for non technical person to understand how to put it together. When I can play the IVR prompt, how to get the key press string from DTMF and do conditional tasks, etc.
Second, to build a simple script, you need a lot of building blocks. At the end the diagram becomes extremely large, and you cannot get all picture. Larger system design becomes nearly impossible.
So at the end we came to the conclusion that the best approach would be to go back to the text editor and console, and let people write their scripts. Take a programming language, easy enough and popular so that most technically literate could understand the syntax and logic, build the parser for it, add domain-specific functions and classes, and you have quite powerful tool in your hands.
First, what language to choose. After some thinking, choice fell on JavaScript, due to its popularity, availability, easy to understand structure and large range of ready made parsers. Consider this small snippet:
playMedia({
source:"welcome_leave_message.wav"
});recordMedia({
target:"recording.wav",
maxlen:120,
stop:"#"
});
Its pretty easy readable, and you can immediately get an idea: system will play a prerecorded greeting, and then will allow to record a message which is at maximum 2 minutes long or can be terminated by pressing pound sign #.
Standard constructs of the language can be used, like assignments, string functions, flow control. For example, detect language from caller ID prefix:
if (caller_id.startswith("34")) {
lang = "es";
}
Using some of the tools available for JavaScript parsing, we can get relatively straightforward structure of the expressions. For example, if parsing above snippet with Esprima, we get following:
{
test: {
callee: {
property: {
type: "Identifier",
name: "startswith"
},
object: {
type: "Identifier",
name: "caller_id"
},
type: "MemberExpression",
computed: False
},
type: "CallExpression",
arguments: [
{
raw: "\"34\"",
type: "Literal",
value: "34"
}
]
},
type: "IfStatement",
consequent: {
body: [
{
expression: {
operator: "=",
right: {
raw: "\"es\"",
type: "Literal",
value: "es"
},
type: "AssignmentExpression",
left: {
type: "Identifier",
name: "lang"
}
},
type: "ExpressionStatement"
}
],
type: "BlockStatement"
}
}
It actually creates a JSON which structurises the entire script. It can further be fed into state machine, which along with the signals from telephony engine can process it block by block and issue control messages of the call flow.
It is not purpose of this article to go into details of this process, however, we have successfully created prototype which is able to do as described above. We have added several telephony specific classes, for example:
- collect DTMF
- play media
- record media
- send SMS
also control flow structures:
- if else
- while
I will share more details of the project in the next posts. In the meantime, if you are interested to become beta testers, please ask us via our Contact Form with your specific case.